对新生儿的运动和姿势评估使经验丰富的儿科医生可以预测神经发育障碍,从而可以早期干预相关疾病。但是,大多数用于人类姿势估计方法的最新AI方法都集中在成年人上,缺乏公开基准的婴儿姿势估计。在本文中,我们通过提出婴儿姿势数据集和深度聚合视觉变压器来填补这一空白,以进行人姿势估计,该姿势估计引入了一个快速训练的完整变压器框架,而无需使用卷积操作在早期阶段提取功能。它将变压器 + MLP概括为特征图内的高分辨率深层聚集,从而在不同视力级别之间实现信息融合。我们在可可姿势数据集上预先训练,并将其应用于新发布的大规模婴儿姿势估计数据集。结果表明,凝集可以有效地学习不同分辨率之间的多尺度特征,并显着提高婴儿姿势估计的性能。我们表明,在婴儿姿势估计数据集中,凝集优于混合模型hrformer和tokenpose。此外,在可可瓣姿势估计上,我们的凝集表现优于0.8 AP。我们的代码可在github.com/szar-lab/aggpose上获得。
translated by 谷歌翻译
This paper proposes Friedrichs learning as a novel deep learning methodology that can learn the weak solutions of PDEs via a minmax formulation, which transforms the PDE problem into a minimax optimization problem to identify weak solutions. The name "Friedrichs learning" is for highlighting the close relationship between our learning strategy and Friedrichs theory on symmetric systems of PDEs. The weak solution and the test function in the weak formulation are parameterized as deep neural networks in a mesh-free manner, which are alternately updated to approach the optimal solution networks approximating the weak solution and the optimal test function, respectively. Extensive numerical results indicate that our mesh-free method can provide reasonably good solutions to a wide range of PDEs defined on regular and irregular domains in various dimensions, where classical numerical methods such as finite difference methods and finite element methods may be tedious or difficult to be applied.
translated by 谷歌翻译
The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the model size; 2) decrease the run-time memory footprint; and 3) lower the number of computing operations, without compromising accuracy. This is achieved by enforcing channel-level sparsity in the network in a simple but effective way. Different from many existing approaches, the proposed method directly applies to modern CNN architectures, introduces minimum overhead to the training process, and requires no special software/hardware accelerators for the resulting models. We call our approach network slimming, which takes wide and large networks as input models, but during training insignificant channels are automatically identified and pruned afterwards, yielding thin and compact models with comparable accuracy. We empirically demonstrate the effectiveness of our approach with several state-of-the-art CNN models, including VGGNet, ResNet and DenseNet, on various image classification datasets. For VGGNet, a multi-pass version of network slimming gives a 20× reduction in model size and a 5× reduction in computing operations.
translated by 谷歌翻译
Dense retrievers have made significant strides in obtaining state-of-the-art results on text retrieval and open-domain question answering (ODQA). Yet most of these achievements were made possible with the help of large annotated datasets, unsupervised learning for dense retrieval models remains an open problem. In this work, we explore two categories of methods for creating pseudo query-document pairs, named query extraction (QExt) and transferred query generation (TQGen), to augment the retriever training in an annotation-free and scalable manner. Specifically, QExt extracts pseudo queries by document structures or selecting salient random spans, and TQGen utilizes generation models trained for other NLP tasks (e.g., summarization) to produce pseudo queries. Extensive experiments show that dense retrievers trained with individual augmentation methods can perform comparably well with multiple strong baselines, and combining them leads to further improvements, achieving state-of-the-art performance of unsupervised dense retrieval on both BEIR and ODQA datasets.
translated by 谷歌翻译
Quantifying the perceptual similarity of two images is a long-standing problem in low-level computer vision. The natural image domain commonly relies on supervised learning, e.g., a pre-trained VGG, to obtain a latent representation. However, due to domain shift, pre-trained models from the natural image domain might not apply to other image domains, such as medical imaging. Notably, in medical imaging, evaluating the perceptual similarity is exclusively performed by specialists trained extensively in diverse medical fields. Thus, medical imaging remains devoid of task-specific, objective perceptual measures. This work answers the question: Is it necessary to rely on supervised learning to obtain an effective representation that could measure perceptual similarity, or is self-supervision sufficient? To understand whether recent contrastive self-supervised representation (CSR) may come to the rescue, we start with natural images and systematically evaluate CSR as a metric across numerous contemporary architectures and tasks and compare them with existing methods. We find that in the natural image domain, CSR behaves on par with the supervised one on several perceptual tests as a metric, and in the medical domain, CSR better quantifies perceptual similarity concerning the experts' ratings. We also demonstrate that CSR can significantly improve image quality in two image synthesis tasks. Finally, our extensive results suggest that perceptuality is an emergent property of CSR, which can be adapted to many image domains without requiring annotations.
translated by 谷歌翻译
电子显微镜数据集的自动分析提出了多个挑战,例如训练数据集规模的限制,样品质量和实验条件变化引起的数据分布的变化等。受过训练的模型继续提供可接受的细分/新数据上的分类性能,并量化与其预测相关的不确定性。在机器学习的广泛应用中,已经采用了各种方法来量化不确定性,例如贝叶斯建模,蒙特卡洛辍学,合奏等。目的是解决电子显微镜数据域特有的挑战,两种不同类型的类型这项工作实施了预训练的神经网络的合奏。合奏在两相混合物中对冰晶进行语义分割,从而跟踪其相变成水。第一个合奏(EA)由具有不同基础体系结构的U-NET样式网络组成,而第二系列合奏(ER-I)由随机初始化的U-NET样式网络组成,每个基础学习者都具有相同的基础架构'一世'。基础学习者的编码者已在Imagenet数据集上进行了预训练。对EA和ER的性能进行了三个不同的指标评估:准确性,校准和不确定性。可以看出,与ER相比,EA具有更高的分类精度,并且可以更好地校准。尽管这两种类型的集合的不确定性量化是可比的,但ER所表现出的不确定性得分依赖于其基本成员('i')的特定架构,并且不一致地比EA更好。因此,与像ER这样的合奏设计相比,像EA这样的合奏设计对电子显微镜数据集的分析所带来的挑战似乎可以更好地解决。
translated by 谷歌翻译
部分最小二平方回归(PLSR)是一个广泛使用的统计模型,可揭示来自自变量和因变量的潜在因子的线性关系。但是,传统方法\ ql {用于求解PLSR模型通常基于欧几里得空间,并且很容易陷入局部最低限度。为此,我们提出了一种新的方法来解决部分最小平方回归,该方法通过对Bi-Grassmann歧管(PLSRBIGR)的优化命名为PLSR。 \ ql {具体来说,我们首先要杠杆}跨互联矩阵的三因素SVD型分解,定义在Bi-Grassmann歧管上,将正交约束优化问题转换为Bi-Grassmann歧管上的无约束优化问题,然后结合矩阵缩放的Riemannian预处理,以调节每次迭代中的Riemannian度量。 \ ql {plsrbigr经过验证},并通过各种实验在运动成像(MI)和稳态视觉诱发电位(SSVEP)任务中解码EEG信号。实验结果表明,PLSRBIGR在多个EEG解码任务中的表现优于竞争算法,这将极大地促进小样本数据学习。
translated by 谷歌翻译
医疗图像合成引起了人们的关注,因为它可能会产生缺失的图像数据,改善诊断并受益于许多下游任务。但是,到目前为止,开发的合成模型并不适应显示域移位的看不见的数据分布,从而限制了其在临床常规中的适用性。这项工作着重于探索3D图像到图像合成模型的域适应性(DA)。首先,我们强调了分类,分割和合成模型之间DA的技术差异。其次,我们提出了一种基于近似3D分布的2D变异自动编码器的新型有效适应方法。第三,我们介绍了有关适应数据量和关键超参数量的影响的经验研究。我们的结果表明,所提出的方法可以显着提高3D设置中未见域的合成精度。该代码可在https://github.com/winstonhutiger/2d_vae_uda_for_3d_sythesis上公开获得。
translated by 谷歌翻译
实例优化系统的新兴类别通过专门研究特定的数据和查询工作负载来显示出高性能的潜力。特别是,机器学习(ML)技术已成功地应用于构建各种实例优化的组件(例如,学习的索引)。本文研究以利用ML技术来增强给定数据和查询工作负载的空间索引,尤其是R-Tree的性能。当R-Tree索引节点覆盖的区域在空间中重叠,在搜索空间中的特定点时,可能会探索从根到叶子的多个路径。在最坏的情况下,可以搜索整个R-Tree。在本文中,我们定义并使用重叠比来量化范围查询所需的外叶节点访问的程度。目的是提高传统的R-Tree对高度重叠范围查询的查询性能,因为它们往往会产生长时间的跑步时间。我们介绍了一个新的AI-Tree,将R-Tree的搜索操作转换为多标签分类任务,以排除外部叶子节点访问。然后,我们将传统的R-Tree扩大到Ai-Tree,形成混合“ AI+R” -tree。 “ AI+R” -tree可以使用学习模型自动区分高和低封闭的查询。因此,“ AI+R” -Tree使用AI-Tree处理高重叠的查询,并使用R-Tree处理低重叠的查询。实际数据集上的实验表明,“ AI+R” -Tree可以在传统的R-Tree上提高查询性能高达500%。
translated by 谷歌翻译
图形神经网络(GNN)在解决图形结构数据(即网络)方面的各种分析任务方面已广受欢迎。典型的gnns及其变体遵循一种消息的方式,该方式通过网络拓扑沿网络拓扑的特征传播过程获得网络表示,然而,它们忽略了许多现实世界网络中存在的丰富文本语义(例如,局部单词序列)。现有的文本丰富网络方法通过主要利用内部信息(例如主题或短语/单词)来整合文本语义,这些信息通常无法全面地挖掘文本语义,从而限制了网络结构和文本语义之间的相互指导。为了解决这些问题,我们提出了一个具有外部知识(TEKO)的新型文本富裕的图形神经网络,以充分利用文本丰富的网络中的结构和文本信息。具体而言,我们首先提出一个灵活的异质语义网络,该网络结合了文档和实体之间的高质量实体和互动。然后,我们介绍两种类型的外部知识,即结构化的三胞胎和非结构化实体描述,以更深入地了解文本语义。我们进一步为构建的异质语义网络设计了互惠卷积机制,使网络结构和文本语义能够相互协作并学习高级网络表示。在四个公共文本丰富的网络以及一个大规模的电子商务搜索数据集上进行了广泛的实验结果,这说明了Teko优于最先进的基线。
translated by 谷歌翻译